Visual annotations and a supervised learning approach for evaluating and calibrating ChIP-seq peak detectors

نویسندگان

Toby Dylan Hocking

Patricia Goerner-Potvin

Andreanne Morin

Xiaojian Shao

Guillaume Bourque

چکیده

Many peak detection algorithms have been proposed for ChIP-seq data analysis, but it is not obvious which method and what parameters are optimal for any given data set. In contrast, peaks can easily be located by visual inspection of profile data on a genome browser. We thus propose a supervised machine learning approach to ChIP-seq data analysis, using annotated regions that encode an expert’s qualitative judgments about which regions contain or do not contain peaks. The main idea is to manually annotate a small subset of the genome, and then learn a model that makes consistent predictions on the rest of the genome. We show how our method can be used to quantitatively calibrate and benchmark the performance of peak detection algorithms on specific data sets. We compare several peak detectors on 7 annotated region data sets, consisting of 2 histone marks, 4 expert annotators, and several different cell types. In these data the macs algorithm was best for a narrow peak histone profile (H3K4me3) while the hmcan.broad algorithm was best for a broad histone profile (H3K36me3). Our benchmark annotated region data sets can be downloaded from a public website, and there is an R package for computing the annotation error on GitHub.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning

Motivation Many peak detection algorithms have been proposed for ChIP-seq data analysis, but it is not obvious which algorithm and what parameters are optimal for any given dataset. In contrast, regions with and without obvious peaks can be easily labeled by visual inspection of aligned read counts in a genome browser. We propose a supervised machine learning approach for ChIP-seq data analysis...

متن کامل

Genome annotation test with validation on transcription start site and ChIP-Seq for Pol-II binding data

MOTIVATION Many ChIP-Seq experiments are aimed at developing gold standards for determining the locations of various genomic features such as transcription start or transcription factor binding sites on the whole genome. Many such pioneering experiments lack rigorous testing methods and adequate 'gold standard' annotations to compare against as they themselves are the most reliable source of em...

متن کامل

Efficient Labelling of Pedestrian Supervisions

Object detection is a fundamental goal to achieve intelligent visual perception by computers due to the fact that objects are the basic building blocks to achieve higher level image understanding. Among the numerous categories of objects in the real-world, pedestrians are among the most important due to several potential benefits brought about by successful pedestrian detection. Often, pedestri...

متن کامل

Webly-Supervised Learning of Multimodal Video Detectors

Given any complicated or specialized video content search query, e.g. ”Batkid (a kid in batman costume)” or ”destroyed buildings”, existing methods require manually labeled data to build detectors for searching. We present a demonstration of an artificial intelligence application, Webly-labeled Learning (WELL) that enables learning of ad-hoc concept detectors over unlimited Internet videos with...

متن کامل

A manually curated ChIP-seq benchmark demonstrates room for improvement in current peak-finder programs

Chromatin immunoprecipitation (ChIP) followed by high throughput sequencing (ChIP-seq) is rapidly becoming the method of choice for discovering cell-specific transcription factor binding locations genome wide. By aligning sequenced tags to the genome, binding locations appear as peaks in the tag profile. Several programs have been designed to identify such peaks, but program evaluation has been...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Visual annotations and a supervised learning approach for evaluating and calibrating ChIP-seq peak detectors

نویسندگان

چکیده

منابع مشابه

Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning

Genome annotation test with validation on transcription start site and ChIP-Seq for Pol-II binding data

Efficient Labelling of Pedestrian Supervisions

Webly-Supervised Learning of Multimodal Video Detectors

A manually curated ChIP-seq benchmark demonstrates room for improvement in current peak-finder programs

عنوان ژورنال:

اشتراک گذاری